91 research outputs found

    Detecting and indexing moving objects for Behavior Analysis by Video and Audio Interpretation

    Get PDF
    2012 - 2013In the last decades we have assisted to a growing need for security in many public environments. According to a study recently conducted by the European Security Observatory, one half of the entire population is worried about the crime and requires the law enforcement to be protected. This consideration has lead the proliferation of cameras and microphones, which represent a suitable solution for their relative low cost of maintenance, the possibility of installing them virtually everywhere and, finally, the capability of analysing more complex events. However, the main limitation of this traditional audiovideo surveillance systems lies in the so called psychological overcharge issue of the human operators responsible for security, that causes a decrease in their capabilities to analyse raw data flows from multiple sources of multimedia information; indeed, as stated by a study conducted by Security Solutions magazine, after 12 minutes of continuous video monitoring, a guard will often miss up to 45% of screen activity. After 22 minutes of video, up to 95% is overlooked. For the above mentioned reasons, it would be really useful to have available an intelligent surveillance system, able to provide images and video with a semantic interpretation, for trying to bridge the gap between their low-level representation in terms of pixels, and the high-level, natural language description that a human would give about them. On the other hand, this kind of systems, able to automatically understand the events occurring in a scene, would be really useful in other application fields, mainly oriented to marketing purposes. Especially in the last years, a lot of business intelligent applications have been installed for assisting decision makers and for giving an organization’s employees, partners and suppliers easy access to the information they need to effectively do their jobs... [edited by author]XII n.s

    Learning skeleton representations for human action recognition

    Get PDF
    Automatic interpretation of human actions gained strong interest among researchers in patter recognition and computer vision because of its wide range of applications, such as in social and home robotics, elderly people health care, surveillance, among others. In this paper, we propose a method for recognition of human actions by analysis of skeleton poses. The method that we propose is based on novel trainable feature extractors, which can learn the representation of prototype skeleton examples and can be employed to recognize skeleton poses of interest. We combine the proposed feature extractors with an approach for classification of pose sequences based on string kernels. We carried out experiments on three benchmark data sets (MIVIA-S, MSRSDA and MHAD) and the results that we achieved are comparable or higher than the ones obtained by other existing methods. A further important contribution of this work is the MIVIA-S dataset, that we collected and made publicly available

    Audio Surveillance of Roads:A System for Detecting Anomalous Sounds

    Get PDF
    In the last decades, several systems based on video analysis have been proposed for automatically detecting accidents on roads to ensure a quick intervention of emergency teams. However, in some situations, the visual information is not sufficient or sufficiently reliable, whereas the use of microphones and audio event detectors can significantly improve the overall reliability of surveillance systems. In this paper, we propose a novel method for detecting road accidents by analyzing audio streams to identify hazardous situations such as tire skidding and car crashes. Our method is based on a two-layer representation of an audio stream: at a low level, the system extracts a set of features that is able to capture the discriminant properties of the events of interest, and at a high level, a representation based on a bag-of-words approach is then exploited in order to detect both short and sustained events. The deployment architecture for using the system in real environments is discussed, together with an experimental analysis carried out on a data set made publicly available for benchmarking purposes. The obtained results confirm the effectiveness of the proposed approach.</p

    An ensemble of rejecting classifiers for anomaly detection of audio events

    Get PDF
    Audio analytic systems are receiving an increasing interest in the scientific community, not only as stand alone systems for the automatic detection of abnormal events by the interpretation of the audio track, but also in conjunction with video analytics tools for enforcing the evidence of anomaly detection. In this paper we present an automatic recognizer of a set of abnormal audio events that works by extracting suitable features from the signals obtained by microphones installed into a surveilled area, and by classifying them using two classifiers that operate at different time resolutions. An original aspect of the proposed system is the estimation of the reliability of each response of the individual classifiers. In this way, each classifier is able to reject the samples having an overall reliability below a threshold. This approach allows our system to combine only reliable decisions, so increasing the overall performance of the method. The system has been tested on a large dataset of samples acquired from real world scenarios; the audio classes of interests are represented by gunshot, scream and glass breaking in addition to the background sounds. The preliminary results obtained encourage further research in this direction

    Multi-Object Tracking by Flying Cameras Based on a Forward-Backward Interaction

    Get PDF
    The automatic analysis of images acquired by cameras mounted on board of drones (flying cameras) is attracting many scientists working in the field of computer vision; the interest is related to the increasing need of algorithms able to understand the scenes acquired by flying cameras, by detecting the moving objects, calculating their trajectories, and finally understanding their activities. The problem is made challenging by the fact that, in the most general case, the drone flies without any awareness of the environment; thus, no initial set-up configuration based on the appearance of the area of interest can be used for simplifying the task, as it generally happens when working with fixed cameras. Moreover, the apparent movements of the objects in the images are superimposed to that generated by the camera, associated with the flight of the drone (varying in the altitude, speed, and the angles of yaw and pitch). Finally, it has to be considered that the algorithm should involve simple visual computational models as the drone can only host embedded computers having limited computing resources. This paper proposes a detection and tracking algorithm based on a novel paradigm suitably combining a forward tracking based on local data association with a backward chain, aimed at automatically tuning the operating parameters frame by frame, so as to be totally independent on the visual appearance of the flying area. This also definitively drops any time-consuming manual configuration procedure by a human operator. Although the method is self-configured and requires low-computational resources, its accuracy on a wide data set of real videos demonstrates its applicability in real contexts, even running over embedded platforms. Experimental results are given on a set of 53 videos and more than 60 000 frames

    IEEE ACCESS SPECIAL SECTION EDITORIAL: MULTIMEDIA ANALYSIS FOR INTERNET-OF-THINGS

    Get PDF

    ieee access special section editorial multimedia analysis for internet of things

    Get PDF
    Big data processing includes both data management and data analytics. The data management step requires efficient cleaning, knowledge extraction, and integration and aggregation methods, whereas Internet-of-Multimedia-Things (IoMT) analysis is based on knowledge modeling and interpretation, which is more often performed by exploiting deep learning architectures. In the past couple of years, merging conventional and deep learning methodologies has exhibited great promise in ingesting multimedia big data, exploring the paradigm of transfer learning, association rule mining, and predictive analytics etc

    La détection et l'indexation des objets en mouvement pour l'analyse du comportement par vidéo et audio interprétation

    No full text
    In the last decades we have assisted to a growing need for security in many public environments. This consideration has led the proliferation of cameras and microphones. However, the main limitation of this traditional audio-video surveillance systems lies in the so called psychological overcharge issue of the human operators responsible for security, that causes a decrease in their capabilities to analyse raw data flows from multiple sources of multimedia information. For the above mentioned reasons, in this thesis we propose an intelligent surveillance system able to provide images and video with a semantic interpretation, for trying to bridge the gap between their low-level representation in terms of pixels, and the high-level, natural language description that a human would give about them.In particular, the proposed framework starts by analysing the videos and by extracting the trajectories of the objects populating the scene (tracking module): it is important to underline that the trajectory is a very discriminant feature, since the movement of objects in a scene is not random, but instead have an underlying structure which can be exploited to build some models. Once extracted, this large amount of trajectories needs to be indexed and properly stored in order to improve the overall performance of the system during the retrieving step (storing and retrieval module). Furthermore, the human operator is informed as soon as an abnormal behaviour occurs (visual behaviour understanding module). Whereas the information extracted from the videos are not sufficient or not sufficiently reliable, the proposed system in enriched by a module in charge of recognizing audio events, such as shoots, screams or broken glasses (audio recognition module). It is worth pointing out that the integration between audio and video based information is a significant add-on for the proposed framework, being a completely novel aspect in the field of video and audio analysis.Each proposed module has been tested both over standard datasets and in real environments; the promising obtained results confirm the advance with respect to the state of the art, as well as the applicability of the proposed method in real scenarios.Dans les dernières décennies, nous avons été témoin d'un besoin grandissant de sécurité dans les espaces publics. Cette nécessite a conduit à une prolifération de caméras de surveillance et de microphones. Toutefois, la limitation principale induite par les systèmes de vidéo surveillance réside dans la surcharge cognitive des opérateurs humains chargés de la sécurité, ce qui diminue leur capacités à analyser le flux d'information émanant de sources multimédia multiples. Pour ces raisons, nous proposons dans cette thèse un système de surveillance intelligent capable d'associer des images et des vidéos à une interprétation sémantique afin de faire le lien entre des représentations bas niveau, sous forme de pixels, et le haut niveau correspondant à une description en langage naturel qu'un être humain pourrait faire d'une scène.Plus précisément, les travaux proposés dans cette thèse débutent par l'analyse des vidéos et par l'extraction des trajectoires des objets présents dans la scène (module de suivi) : Il est important de souligner que la trajectoire est une caractéristique primordiale étant donné que le mouvement des objets dans une scène n'est pas aléatoire, mais possède une structure sous-jacente qui peut aider à la conception de certaines modèles. Une fois extraites, ce grand nombre de trajectoires doit être indexé et stocké afin d'augmenter la performance du système durant la phase de reconnaissance (module de stockage et de reconnaissance). En outre, l'opérateur humain est informé immédiatement si un comportement anormal est observé (module de compréhension visuelle de comportement). Tandis que l'information extraite des vidéos n'est pas suffisante ou n'est pas suffisamment fiable, le système proposé est enrichi par un module en charge de la reconnaissance des événements sonores tels que des tirs, des cris ou des vitres cassées (module de reconnaissance audio). Il est important de noter que la fusion de l'information basée sur la vidéo et l'audio est une contribution importante et originale de nos travaux dans le cadre de l'analyse vidéo et audio.Chaque module proposé a été à la fois testé sur des jeux de données standards mais aussi dans un environnement réel ; les résultats obtenus, tout comme l'application des méthodes proposées dans un contexte réel, permettent de confirmer la contribution de nos travaux à l'état de l'art
    corecore